Modeling of COVID-19 vaccination rate using odd Lomax inverted Nadarajah-Haghighi distribution

Since the spread of COVID-19 pandemic in early 2020, modeling the related factors became mandatory, requiring new families of statistical distributions to be formulated. In the present paper we are interested in modeling the vaccination rate in some African countries. The recorded data in these countries show less vaccination rate, which will affect the spread of new active cases and will increase the mortality rate. A new extension of the inverted Nadarajah-Haghighi distribution is considered, which has four parameters and is obtained by combining the inverted Nadarajah-Haghighi distribution and the odd Lomax-G family. The proposed distribution is called the odd Lomax inverted Nadarajah-Haghighi (OLINH) distribution. This distribution owns many virtuous characteristics and attractive statistical properties, such as, the simple linear representation of density function, the flexibility of the hazard rate curve and the odd ratio of failure, in addition to other properties related to quantile, the rth-moment, moment generating function, Rényi entropy, and the function of ordered statistics. In this paper we address the problem of parameter estimation from frequentest and Bayesian approach, accordingly a comparison between the performance of the two estimation methods is implemented using simulation analysis and some numerical techniques. Finally different goodness of fit measures are used for modeling the COVID-19 vaccination rate, which proves the suitability of the OLINH distribution over other competitive distributions.


Introduction
The amount of data obtained for analysis has been growing increasingly, requiring new statistical distributions that enables us to depict every phenomenon under study. Modeling  observations using probability distributions is one of the most essential responsibilities that statisticians must handle. Many scientific fields require statistical models to describe the trend and to predict the future behaviour of their data, for example, medical, engineering, finance, and others. Therefore many lifetime models have been employed in literature to describe various forms of survival data, so the newly created families of distributions are strongly depending on the quality of statistical analysis processes,the flexibility and the characteristics of the new models, therefore, significant efforts are focusing on constructing new statistical models. Still there is a persistent need to create new models or formulate new extensions for achieving better fit of the real lifetime data. Tahir et al. [1] proposed the inverted Nadarajah-Haghighi (INH)) distribution,which is a new inverted model with decreasing and uni-modal (right-skewed) density, with decreasing and upside-down bathtub hazard rate shapes (UBT). They addressed several statistical features of the INH distribution and used various frequentest approaches to estimate the model's parameters. They have demonstrated the suitability of INH distribution by testing real-life data sets. They also obtained that the INH model was better fit with comparison to other wellknown lifetime models such as, the inverted exponential, the inverted gamma, the inverted Weibull and the inverted Lindley among others.
Several researchers have addressed the applications of inverted distributions, one can refer to Folks and Chhikara [2], Rosaiah and Kantam [3], De Gusmao et al. [4], Joshi and Kumar [5], Almetwally [6], Ibrahim and Almetwally [7], Ramos et al. [8], Almetwally [9], Hassan et al. [10], and Basheer et. al [11] among others. Some generalizations of the INH distribution were introduced in literature for example, the Marshall-Olkin INH distribution was studied by Raffiq et al. [12], Toumaj et al. [13] proposed the transmuted INH distribution. Elshahhat and Rastogi [14] discussed parameter estimation of lifetime for the INH distribution with Type-II progressively censored samples. Still there is space for new generalizations and extensions for the INH distribution, consequently, the new extension is superior to the original INH distribution and other competitive models specially for modeling COVID-19 vaccination rate. Let x be a random variable with the parameters δ, θ > 0 that follows the inverse Nadarajah-Haghighi distribution (INH). The CDF and pdf are as follows: and, gðx; YÞ ¼ dy In this work we are introducing a new extension of INH distribution with four parameters, namely the odd Lomax INH (OLINH) based on the odd Lomax-G (OL-G) family introduced by Cordeiro et al. [15]. Adding more parameters to the original distribution improves that distribution and make it more flexible and reliable to model some real life data.
Let g x; Y ð Þ ¼ dGðx;YÞ dx be the pdf of a baseline model with vector parameter Θ, then the CDF of the OL-G family is given by: where O = (α, β, Θ) is a vector of parameters of OL-G family. The pdf of (3) is defined by where α, β > 0 are shape parameters. The random variable with pdf (4) is denoted by X �OL-G(O). A new extended four-parameter Weibull, Lomax, log-logistic, and log-Lindley distributions, called the OL-Weibull, OL-Lomax, OL-log-logistic, and OL-log-Lindley distributions respectively, were introduced by Cordeiro et al. [15]. Odd Lomax-exponential distribution was introduced by Ogunsanya et al. [16]. Yakura et al. [17] introduced the Lomax-Kumaraswamy distribution. Marzouk et al [18] obtained a generalized odd Lomax family of distributions with applications. The extended odd Lomax family of distribution was described by Abubakari et al. [19].
The main idea of this work is to study the statistical properties of the new extension model and investigate the point and interval estimation for its unknown four-parameters. Two estimation methods are considered: the maximum likelihood, and the Bayesian estimation methods. To verify the efficiency of the proposed estimation methods and to study how these estimators perform for various sample sizes and parameter values, statistical analysis is carried out using simulation study via R-coding. A real data example emphasizes the suitability of OLINH model over INH and other competitive models with two and three parameters. The rest of this article is organized as follows: The OLINH distribution is defined in Section 2. In Section 3, some statistical properties for the OLINH distribution are obtained. Section 4 studies two methods of estimation. To judge the efficiency of these estimation methods, a simulation study is performed in Section 5. The Application of COVID-19 vaccinate rate data from 46 different African countries is considered in Section 6 for illustrative purpose. Finally, in Section 7, conclusions are provided.

OLINH distribution
Consider the OL-G family with the INH distribution as a baseline function, then a fourparameters OLINH distribution is generated. By substituting the INH model's CDF and pdf files (1) and (2) in the OL-G family (3) and (4), the OLINH distribution CDF and pdf are obtained as: and f ðx; respectively, where x > 0, α, β, δ, θ > 0. A random variable with pdf (6) is denoted by X �OLINH(α, β, δ, θ). The hazard rate function (hrf) of the OLINH distribution is given by The odds ratio of failure (ORF) of the OLINH distribution is otained by Figs 1 and 2 are separate shapes of the OLINH distribution's pdf and the hrf for different parameters values respectively. The density shape of the OLINH distribution can be rightskewed and Rev-J shaped. The hrf of the OLINH distribution has some interesting shapes, such as, decreasing and upside down bathtub. Different shapes of hrf create an appealing features for modeling many lifetime data such as biomedical and biological studies, reliability analysis, physical engineering, and survival analysis.

Statistical characteristics of OLINH distribution
In this section, we observe some statistical characteristics of the OLINH distribution, such as, the linear representation of its pdf, quantile, the moments, the moment generating function (MGF), Rényi entropy and ordered statistic function.

Linear representation
According to Cordeiro et al. [15] the linear representation for the density of the OL-G family is given by The linear representation for the cumulative density of the OL-G family is as follows Using Eq (7), the Linear representation for the pdf of the OLINH density can be written as Eq (9) denotes the exponentiated INH density with power (k + j + 1). Using Eq (8), we obtain the linear representation of CDF for the OLINH distribution Linear representation for pdf and CDF of the OLINH are valuable when finding moments, moment generating function, Rényi entropy, and ordered statistics density.

Quantile for the OLINH distribution
The quantile of a certain distribution is an important measure of location, it is usually used to create a random sample in simulation analysis. To do so, let x = Q(x) = F(x, O) −1 , hence for the OLINH distribution the quantile can be obtained by inverting Eq (5) to get: In particular, the three quartiles, say Q1, Q2, and Q3 can be observed by selecting some fixed values of q = 0.25, 0.5, and 0.75, respectively, in Eq (11). By this equation, we can obtain skewness and kurtosis measures, see

Moments for the OLINH distribution
Let x be a random variable following the OLINH distribution, then the r th moment of x follows from Eq (9), and using power series with some algebraic manipulations to have the following ðr þ sÞ: The ordinary moments are useful in evaluating skewness and kurtosis values see Fig 3. The r th incomplete moment of OLINH is expressed as The moment generating function of OLINH distribution is given by

Rényi entropy
Rényi entropy is known as an extension of Shannon entropy, Rényi entropy of order z is defined as Using the OLINH density from Eq (6), and apply the power series with integration techniques and some algebraic simplification Rényi entropy can be written as Fig 4 shows the Rényi Entropy for some parameter values of OLINH model with different values of z. Rényi Entropy has many applications for more information see [20][21][22][23]. By this Figure, we note that the Rényi Entropy decreases when the z values increases.

Order statistics
Let x 1 , . . ., x n be a sample of size n drawn randomly from a continuous pdf f(x). Suppose x 1:n < x 2:n < . . . < x n:n are the related order statistics. If the random sample follow OLINH distribution, then from Eqs (8) and (9) the pdf of the k th order statistics x k:n is given by From the above equation we can say that the OLINH order statistics pdf is a represented as a linear combination of the exponentiated INH densities, hence many statistical properties of the ordered statistics can be derived easily from the characteristic of h t+u+1 (x).

Estimation methods
The estimation problem of the OLINH distribution parameters is studied in this section using: The maximum likelihood estimator (MLE), and the Bayesian estimation based on the squared error loss function.

Maximum likelihood estimation
Let X 1 , . . ., X n be a random sample from OLINH distribution with parameters α, β, δ and θ. Then the log-likelihood for the OLINH is provided by To maximize the log-likelihood equation, we need to take the partial derivatives of l(O) with respect to the model parameters α, β, δ and θ and equate them to zero, hence we obtain the following system of nonlinear equations: and where A i ðd; yÞ ¼ 1 It is possible to obtain the MLE of α (â) explicitly from Eq (15), hencê whereb;d andŷ, are the MLEs of β, δ and θ respectively, and they are obtained numerically by solving the above system using some techniques such as the Newton-Raphson method, R packages are used for that purpose.

Bayesian estimation
The Bayesian approach deals with the parameters as random variables with certain prior distribution. The ability to incorporate prior knowledge into research makes the Bayesian method very useful in the survival analysis. One of the main problems associated with survival analysis is the limitation of data availability. For the parameters α, β, δ and θ we suggest gamma distribution as prior functions, therefore the parameters α, β, δ and θ have gamma distributions Gamma(μ 1 , ν 1 ), Gamma(μ 2 , ν 2 ), Gamma(μ 3 , ν 3 ), and Gamma(μ 4 , ν 4 ) respectively. Hence the independent joint prior density function can be written as follows: PðOÞ / a m 1 À 1 b m 2 À 1 d m 3 À 1 y m 4 À 1 e À ðn 1 aþn 2 bþn 3 dþn 4 yÞ ð19Þ The joint posterior density function of O is calculated using the likelihood function and joint prior function, and is given by Based on the squared error loss function, the Bayes estimators ofÕ is: It's worth noting that the integrals offered by Eq (21)  a � Gamma n þ m 1 ;n 1 þ Pðdja; b; y; xÞ / d nþm 3 À 1 e À n 3 d e À 1þ d

Simulation
In this section, the Monte-Carlo simulation process is utilized to compare the conventional estimation methods: MLE and Bayesian estimation method under square error loss function. Simulation analysis is based on MCMC method for estimating the OLINH lifespan distribution's parameters using R software with 5000 iterations, hence random samples are generated from the OLINH distribution samples, where x represents the OLINH lifetime for various parameter actual values and sample sizes n: (30, 80, and 150). Different real values of the parameters of the OLINH distribution are obtained. Asymptotic confidence intervals for MLE and the Bayesian credible intervals are obtained, the highest posterior density interval (HPDI) was used for finding the credible intervals. The best estimator method is defined by minimizing estimator's relative bias (RB), the mean squared error (MSE), and the length of confidence interval (L.CI).  Tables 1-3 summarize the simulation results of the methods discussed in this paper for point and interval estimation. The RB, MSE, and L.CI values are used to make the essential comparisons between various point and interval estimating methods. The following conclusions are summarized from these tables:

PLOS ONE
1. The RB, MSE, and L.CI decrease as n increases for actual parameters of the OLINH distribution.
2. Bayesian estimation is the best estimation method.
3. Credible interval of Bayesian estimation by HPDI is the shortest CI of parameters of OLINH distribution.
5. For fixed α, δ, θ, and sample size, the RB, MSE, and L.CI increase as β increases, in almost all cases.
6. For fixed β, δ, θ, and sample size, the RB, MSE, and L.CI increase as α increases,in almost all cases.

Analysis of COVID-19 vaccination
COVID-19 vaccination rate data from 46 different countries in southern Africa is considered, some statistical measures are summarized in Table 4. Our goal is to model these rates by implementing the OLINH distribution to describe their trend and to predict future values of the vaccination rate. For that purpose some goodness of fit measures are used and a comparison between our model and other competitive models are presented in Table 5. The goodness of fit measures are: Kolmogorov-Smirnov statistics (KSS) with P-value (KSP-value), Cramér-von   Table 5 shows that the OLINH distribution has the least values for all information measures with respect to other distributions. The suggested competitive distributions are: the extended odd Weibull inverse Nadarajah-Haghighi (EOWINH)(Almetwally [24]), exponential Lomax (EL) (El-Bassiouny et al. [25]), Kumaraswamy Weibull (KW) (Cordeiro et al. [26]),  [29]). As a result, we conclude that OLINH best suits and fit the COVID-19 vaccination rate data set.

Conclusion
A new Extension of INH and Lomax distributions called OLINH distribution is formulated in this paper. We studied its statistical properties and obtained its pdf as linear representation, quantile function of moments, moment generation functions, and Rényi entropy are also obtained. Point estimation of the OLINH unknown parameters α, β, δ, and θ were considered by the MLE, and the Bayesian estimation methods. Interval estimation of the OLINH parameters α, β, δ, and θ were considered by the MLE asymptotic approximation, and Bayesian credible interval estimation methods. To distinguish the performance of the different estimation methods, a comparison was carried out through Monte-Carlo simulation analysis using the R package. For that reason, the COVID-19 data sets were also considered, and OLINH was shown to match these data better compared to other competitive distributions. Bayesian estimation was better than the MLE for estimating the parameters of OLINH distribution.